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The ubiquity of power-law relations in empirical data displays physicists’ love of simple laws and 
uncovering common causes among seemingly unrelated phenomena. However, many reported power 
laws lack statistical support and mechanistic backings, not to mention discrepancies with real data 
are often explained away as corrections due to finite size or other variables. We propose a simple 
experiment and rigorous statistical procedures to look into these issues. Making use of the fact 
that the occurrence rate and pulse intensity of crumple sound obey power law with an exponent 
that varies with material, we simulate a complex system with two driving mechanisms by crumpling 
two different sheets together. The probability function of crumple sound is found to transit from 
two power-law terms to a bona fide power law as compaction increases. In addition to showing the 
vicinity of these two distributions in the phase space, this observation nicely demonstrates the effect 
of interactions to bring about a subtle change in macroscopic behavior and more information may 
be retrieved if the data are subject to sorting. Our analyses are based on the Akaike information 
criterion that is a direct measurement of information loss and emphasizes the need to strike a balance 
between model simplicity and goodness of fit. As a show of force, the Akaike information criterion 
also found the Gutenberg-Richter law for earthquakes and the scale-free model for brain functional 


network, 2-dimensional sand pile, and solar flare 
They resemble more the crumpled-together ball at 
driving mechanisms that take turns occurring. 

PACS numbers: 05.45.-a, 89.75.Fb, 05.40.Ca, 64.60.av 

I. INTRODUCTION 

It is a deeply established tradition in physics to search 
for unifying laws, for universal principles that can by¬ 
pass the specihcity of particular systems to capture 
the underlying unity of the world. A contemporary 
pursuit concerns the abundant simple power-law (SPL) 
distributions [I], g{x) = ajx^, over a wide range of mag¬ 
nitudes that surfaced in 1// noise[2l[3], economy [4], dis¬ 
tribution of income and wealth among the population!^, 
foraging patterns of sharks and tuna[3], and brain activ¬ 
ity and heart rate[7] to name just a few. One attempt to 
explain their deeper origin is the concept of self-organized 
criticality proposed by Bak, Tang, and Wiesenfeld[5] in 
1987. The power-law distribution of sand avalanches and 
the fact that sand piles can come back to the critical 
slope without deliberate tuning of parameters have been 
a paradigm for self-organized criticality, although the dy¬ 
namics of a real sand pile has been demonstrated [S] to 
behave more like a Hrst-order transition. Another no¬ 
table approach is the use of renormalization group [lO], 
motivated by the resemblance to the power-law diver¬ 
gence of physical quantities, such as specihc heat, sus¬ 
ceptibility, and correlation length, with universal critical 
exponents in systems undergoing a smooth phase transi¬ 
tion. In spite of many more generative models for various 
reported power laws|llj. statistical support and mecha¬ 
nistic sophistication are in dire need for improvement [12] . 
Faced with these dehciencies, it is therefore not surprising 
that the relevance and usefulness [13] and legitimacy [3] of 
some power-law claims have been called into question. 

Crackling noise from candy wrappers and food bags 


intensity to suffer excessive loss of information, 
low compactions in that there appear to be two 


is something we all hate in the cinema. Its occurrence 
rate versus pulse intensity has also been reported [l4l 115] 
to obey the power law and may have bearing[T7] on the 
Gutenberg-Richter law[TH] for earthquakes. In Section 
II, we shall introduce two versions of crumpling experi¬ 
ments. In the hrst one the sound data are collected from 
two separately crumpled thin sheets. Since the power-law 
exponents are distinct for different materials, we are sure 
that the combined data should be ht by double power 
laws (DPT), -|- 02 / 4 :^^. However, when prepared 

in a log-log plot, the data points turned out to line up 
in an approximately straight line, and all our colleagues 
congratulated us for having discovered a new power law. 
This incident alerted us to search for a more rigorous 
criterion for power law and in the mean time reexam¬ 
ine the existing examples. Pedagogical derivations are 
given in Section III to explain in mathematics why the 
combination of two different power laws should look so 
tantalizingly similar to a simple power law. This paves 
the way for the introduction of more rigorous statistical 
procedures in Section IV that is capable of picking out 
the better htting function among a group of competing 
candidates. In our second experiment two different sheets 
are truly crumpled together. This was briehy introduced 
in Section II, but can now be fully investigated after be¬ 
ing equipped with full knowledge of the new information 
criterion. A change of statistical property is expected 
when the interactions between these two sheets increase. 
Initially they exhibit different power-law exponents, but 
as crumpling proceeds the crumpled ball should reach a 
compact state that is indistinguishable from that of a 
single (composite) sheet. In other words, we anticipate 
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a transition of macroscopic behavior from double to a 
single power law as a consequence of intensifying interac¬ 
tions. Conclusion and discussion are arranged in Section 
V. Alternative and mathematically more rigorous deriva¬ 
tions for the results in Section III are added in Appendix 
A. Reasoning behind the likelihood ratio test employed 
to make sure the statistical evidence is strong enough to 
dismiss the existing model is discussed in Appendix B. 
While the main text addresses only probability density 
functions, the statistical method we introduced in Sec¬ 
tion IV can be generalized to discuss phenomena that 
do not involve probability or when the raw data are not 
available. The procedures are detailed in Appendix C. 


II. CRUMPLE SOUND EXPERIMENT AND 
THEORETICAL MODEL 

We performed the crumple sound experiment inside a 
soundproof chamber with foam rubber plank on the inte¬ 
rior to avoid echo, inside of which a microphone was con¬ 
nected to a Sony ICD-PX333 recorder, crumple sound 
is recorded at a sample rate of 44,100 points per sec¬ 
ond in 16-bit precision. The amplitude is measured in 
computer unit (c.u.) and maximum amplitude (Amax) 
is 2^® — 1. The gain of the sound card is constant and 
the sample is crumpled manually at a distance of 10 cm 
from the microphone. We used the aluminum foil (Al), 
High Density Polyethylene (HDPE), and A4 copy paper 
as our samples. They are of thickness 16, 13, and 60^m 
respectively, and cut into squares (20cm x 20cm) for uni¬ 
formity. Crumpling is kept at a slow rate of about 90 
seconds per sheet to facilitate the separation of individ¬ 
ual pulses. Care is taken to avoid friction noise caused 
by the relative motion between hand and sample. 

The MATLAB program was used to convert sound to 
signal amplitude. To estimate the background noise and 
dc offset, we started the recording 5 seconds prior to each 
round of crumpling. The average amplitude is about 
10“^ as normalized by Amax, and thrice this amount was 
set as the noise threshold. The C code algorithm auto¬ 
matically integrates the sound intensity every 200/44100 
second. When the value exceeded that of the background 
noise, the beginning of a new pulse was marked. When¬ 
ever a dilemma arose at distinguishing a long pulse from 
two overlapping pulses, a more scrupulous criterion was 
applied. We resorted to smaller time step to examine the 
grey area by including just six amplitude peaks. Since 
this value is expected to decrease as a pulse fades, a sud¬ 
den switch to an increasing function indicates the begin¬ 
ning of a second pulse. 

We agree with Houle and SethnafU] that crumple 
sound emits when facets suddenly buckle from one con¬ 
figuration to another and is not necessarily accompanied 
by the creation of a new ridge. To distinguish these 
sound-generating surfaces from the facets encircled by 
the ridges, we shall call the former as “drums”. Kramer 
and Lobkovskv|14j who used paper that has been crum¬ 
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FIG. 1: (color online) The normalized occurrence frequency 
is plotted against pulse energy for the crackling sound from 
a singly crumpled and wringed HDPE sheet. The red solid 
fitting line represents a shifted power law, while the green 
dotted line a simple power law. Note that both lines share the 
same slope at large energies, which implies their exponents are 
identical at about 2.653±0.006. See Table||]for further detail 
on the other two samples. 

pled and uncrumpled thirty times have demonstrated 
that drums are a different entity from facets - a drum 
may comprise of many facets. To understand the origin 
of power law for its occurrence rate, let us imagine a sheet 
of unit area and call this size-I drum. In the process of 
crumpling, drums of smaller sizes will appear and have 
their chance to emit sound at random time. Overall, we 
have 2" number of size-1/2"' drums where n is a nonnega¬ 
tive integer. Presumably, bigger drums sound louder and 
we can assume the intensity of crumple sound to be pro¬ 
portional to the drum area; namely, ~ 1/2". For the 
sake of simplicity, we restrict each drum to emit sound 
only once. The net number of sound pulses measured in 
the crumpling experiment consequently equals the total 
number of drums: 

This simple model [TE| readily predicts a power law with 
13=2 for the occurrence rate versus sound intensity. By 
allowing some drums to go mute or be beaten multiple 
times, the exponent can be tailored to match the empir¬ 
ical values. 

Unlike Ref. [15], we found the crumple sound to differ 
from that from wringing via the cylinder geometry (see 
Fig§. A power law can fit the wring sound nicely, but 
the crumple data exhibit an obvious down turn in the 
full-log plot and resemble more a shifted power law. We 
believe the discrepancy is due to the fact that a crum¬ 
pled ball contains multiple layers that shield and cut 
down the sound intensity. Placing one or both hands 
over the mouth is enough to convince oneself that the at¬ 
tenuation by shielding must be a sizable factor. In order 
to quantify the effect of attenuation, we wrap the thin 
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FIG. 2: (color online) The ability of different material to 
shield sound as more layers are accumulated. A mini-speaker 
was inserted inside the crumpled balls to determine the at¬ 
tenuation ratio, as defined in the text. 
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FIG. 3: (color online) Standard deviation plotted against fit¬ 
ting exponent of power law for crumple sound. (A/3)^ is pro¬ 
portional to the ratio of (t^(( 3) and its curvature, (a) Paper 
(in red circle) and A1 (in black square) and their combined 
data (in green cross). The optimal /3 can be read off from 
the minima position to be 1.68, 1.22, and 1.44. Combined 
data exhibit a large curvature, which results in a mislead¬ 
ingly small A/3 = 0.0126, compared to 0.0198 and 0.0200 
for paper and Al. (b) Paper and HDPE (in blue triangle, 
/3 = 1.94, A/3 = 0.0143) and their combined data (/3 = 1.84, 
A/3 = 0.0128). 


sheet around a mini-speaker before crumpling into dif¬ 
ferent compactions. Extent of wrinkling is therefore not 
fixed, but increases with the number of layers. We de¬ 
fine the attenuation ratio as the deviation from unity of 
the ratio between muffled intensity and that of our prere¬ 
corded sound. The results are shown in Figj^ which list 
paper as being most effective at dissipating the acoustic 
energy among the three materials for the same number 
of layers. 

Due to the shielding effect, the straight line typical of 
power law in a full-log plot thus bends downward for the 
crumple data because weak sounds are mostly measured 
in the later stage of crumpling. This is when the layer 
number reaches its maximum and so we expect the data 
points to deviate from the straight line more than their 
loud counterpart. This changed the distribution func¬ 
tion from power law to a shifted power law or the Zipf- 
Mandelbrot distribution [12] (ZMD), g{x) = a/(x -I- 7 )^, 
where 7 is a parameter that measures the extent of at¬ 
tenuation. However, since ZMD reduces to power law 
as X gets large, the /3 value determined from ZMD will 
be identical to that for the “true” model, namely, power 
law, without the artifact and correction due to shielding. 
This is verified by FigJ^ 

Having established the protocol for analyzing the 
sound data, we move on to truly crumple two different 
sheets together. Although the value of (3 has been deter¬ 
mined in Figj^to vary with material, there is no knowing 
how the twisting and mingling with another sheet will af¬ 
fect the statistical behavior. This simple experiment is 
designed to simulate a complex system with two driving 
mechanisms and allow confirmation of the general belief 
and model prediction | 20 ] that interactions can bring out 
a subtle change in macroscopic behavior. 

In order to allow the two sheets to fully interact with 
each other, crumpling is thus more ideal than wringing. 


We oriented the two sheets in perpendicular directions 
prior to crumpling to prevent them from “sticking” and 
becoming a single composite sheet from the beginning. 
Care was also taken to avoid phase separation; i.e., we 
made sure that both sheets mixed thoroughly and dis¬ 
tributed evenly, as demonstrated by Figj^ In contrast, 
a similar analysis was repeated for sheets without in¬ 
teractions; namely, individual hands crumpled them. It 
turned out that both data lined up tantalizingly straight 
in the full-log plot - a revealing sign of power law, while 
the contrast group ironically enjoyed the smaller error, 
A/3. 


III. USING A SMALL A/3 AS AN INDICATION 
FOR POWER LAW IS PRONE TO ERROR 


In order to avoid this error, we need to go back and 
understand how A/3 was calculated. The magnitude of (3 
comes from maximizing the likelihood function L, while 
its err or A /3 is estimated by (—lnL/(i/3^)“^/^. By use 
of Eq.(C3), this formula gives 




2a2 


N 


d/32 


( 1 ) 


where a denotes the standard deviation. Now imagine 
two independent sets of power-law data, yi = 1 /xf^ +Ayi 
and Zi = l/a:f^ -f Az^ where f3\ 7 ^ f32 and i = 1, • • • , A 
labels the slices in the histogram. For simplicity, let us 
suppose the random numbers Ay^ and Azi render rela¬ 
tively small A/3fc//3fe for A: = 1,2. According to Eq. 0 . 


(A/3fc)2 


1 


fc = 1,2 


( 2 ) 


in which a statistical average over the random numbers 
is implied. If both data are combined and fit by a single 
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FIG. 4: (color online) Aluminum foil and A4 paper (colored 
to enhance the contrast) are crumpled together by hand. Fig¬ 
ures (a) and (b) show the exterior of the crumpled ball at low 
and high compactions, while (c) and (d) show their interior. 
The different characteristics of A1 and paper are retained in 
(a) and (c) for which the occurrence rate of crumple sound 
is found to obey double power laws. When inter-sheet in¬ 
teractions intensify, individual properties get erased and the 
system transits to obeying power law in (b) and (d). 


power law out of ignorance, Eq. 0 gives 


(A/3)2 


E, 


{ax-^ - + (Ay^r + {Az^f 


^Ei(®i '^^)(lna:i)2 


(3) 


under a general scenario in Appendix A. 

In the case of perfect power laws, i.e., Aj/i = Azi = 0 
and A/ 3 i ,2 = 0, Eq.(j^ predicts A/3 > A/3i,Vi = 1,2, as 
expected for our rash move. However, as long as A/3i 7 ^ 0 
and exceeds about 2 x 10 “^, the ratio of first terms in 
the numerator and the denominator of Eq.([^ becomes 
smaller than the subsequent ratio of second and third 
terms. A closer look reveals that the second and third 
ratios are simply A/3i and A /32 from Eq.([^. It takes only 
simple arithmetic to confirm that A/3 < max{A/3i, A/ 32 }. 
The reasoning behind this counterintuitive result is due 
to the misuse of Eq.0 for (A/3)^ when the fitting model 
is wrong. For this model misspecihcation situation, a 
correction for (A/3)^ should be adopted using Huber’s 
sandwich estimation]^, while applied users are seldom 
aware of this issue. 

Figure 1^ shows for Al, paper, and HDPE and 

the combination of their data - all modeled by the power 
law. The much larger curvature for the combined data 
results in a smaller A/3 according to Eq.Q in spite of a 


large a. Due to their 13 being distinct, the correct fitting 
function is DPT rather than SPL. However, since increas¬ 
ing fitting parameter almost always improve the standard 
deviation, we cannot rely on the likelihood function alone 
to measure their relative fitting performance. It is thus 
imperative to seek other information criterion that also 
takes into account the principle of parsimony or model 
simplicity. 


IV. STATISTICAL ANALYSES BASED ON THE 
AKAIKE INFORMATION CRITERION 

Founded on information theory, the Akaike informa¬ 
tion criterion [22j came to our rescue. It quantifies the 
relative fitting performance of distribution functions, 
g{x), for a given set of data. As described by the 
Kullback—Leibler distance]!!] which measures the infor¬ 
mation loss when using g{x) to approximate the true dis¬ 
tribution, AIC value is defined as 

AIC = 2A:-21nL (4) 

in which the first term penalizes the abuse of free pa¬ 
rameters, fc, and the likelihood function L rewards good¬ 
ness of fit. Smaller AIC value indicates less information 
loss. This trade-off between goodness of fit and model 
simplicity bears resemblance to the Helmholtz free en¬ 
ergy, F = U — TS where T denotes the temperature, for 
canonical ensembles in equilibrium statistical mechanics. 
In contrast to Eq.(|^, F balances the competing trends 
of minimizing internal energy U and maximizing entropy 
5. 

The likelihood function is defined as 

n 

i=l 

in which n is the number of raw data and g(x') the fitting 
functions. After the data has been grouped into a his¬ 
togram of N slices, the log-likelihood function becomes 

N 

\nL = n'^f{xj)\ng{xj), (6) 

i=i 

in which nf(xj) denotes the counts of j-th slice. Since 
both f(x) and g(x) are destined to describe probability 
density functions, it is important to remember to impose 
the normalization: '^f{xi) = f g(x)dx = 1. 

Figure]^ illustrates the schematic relationship between 
DPL and ZMD, and other functions [24] that are often 
checked against SPL. The ZMD and DPL are general¬ 
ized versions of SPL and unlike the exponential, Poisson, 
and log-normal distributions that are the simplest form 
in their own category. However, being closer to the data 
in Fig]^ DPL and ZMD enjoy less information loss, al¬ 
though they are more complex than SPL. 

Armed with the Akaike information criterion, we can 
quantitatively demonstrate that ZMD indeed fits the 
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TABLE I: Comparing AIC values of different models for singly crumpled sheets. The parameters of selected model are 
highlighted in boldface. 


Sample 

SPL 

DPL 

ZMD 

AIG P 

AIG (di, 132) 

AIC (7, 13) 

HDPE 

17379.2 1.732 

16913.9 (2.373, 2.331) 

16802.8 (0.0068, 2.653) 

Paper 

11807.8 1.565 

11769.4 (2.051, 1.894) 

11765.9 (0.0020, 1.797) 

A1 

14383.9 1.298 

14376.8 (2.249, 1.362) 

14374.4 (0.001, 1.369) 


TABLE II: Comparing AIC values of different models for celebrated power-law claims. The parameters of selected model are 
highlighted in boldface. The magnitudes of 01,2 are comparable, so only their signs are included for brevity. 



SPL 

DPL 

ZMD 

Phenomenon 

AIC 

13 

AIC 

(ai, 02) 

(dl, 132) 

AIC 

(7, 13) 

Earthquake [26] 

133115.7 

1.03 

133112.9 

(+!+) 

(1.10, 0.76) 

133117.7 

(-0.01, 1.03) 

Brain Functional Network|27| 

11456.73 

2.33 

11420.49 

( + : + ) 

(3.42, 1.62) 

11422.94 

(-3.33, 1.82) 

Solar Flare Intensity]!!] 

72028.82 

2.10 

71431.24 

( + ! + ) 

(3.76, 1.75) 

71439.20 

(-37.07, 1.74) 

2-dimensional Sand pile Model [28| 

1266784 

1.01 

1266651 

(“; + ) 

(0.77, 0.91) 

1266656 

(0.17, 1.04) 

Solar Flare Rate|29| 

54747.96 

1.10 

54561.22 

(+5“) 

(0.87, 0.77) 

54201.22 

(48.08, 2.10) 

Web Link[30] 

1095357310 

1.72 

1094847304 

(+)+) 

(1.46, 1.71) 

1087235440 

(0.75, 2.03) 

Protein-Domain Freauencv|19| 

17445.29 

0.56 

17427.53 

(+;“) 

(0.85, 0.94) 

17422.58 

(3.89, 0.81) 

Stock-Market Fluctuation|31| 

18275.57 

3.25 

18258.62 

(“; + ) 

(2.43, 3.02) 

18200.94 

(1.70, 5.82) 



FIG. 5: (color online) Schematic relation of information loss 
by different models. Number of free parameters is indi¬ 
cated in the parenthesis following each distribution. Dis¬ 
tance between each point and the data in this model space 
reflects the amount of information loss as measured by the 
Kullback—Leibler distance. The dash line traces out a set 
of distributions with their simplest form. In contrast, DPL 
and ZMD are on the same green solid line as SPL since they 
belong to the same category. 


singly crumpled data better than SPL and DPL. See 
Table |l] We then did a more thorough analysis on the 
crumpled-together data by separating them into eight 
time stages in Fig|^ A transition from DPL to SPL was 
revealed as compaction increases, which confirms the pre- 



Stage 

FIG. 6: (color online) The Akaike-information-criterion anal¬ 
yses of sound for two materials crumpled together. Data were 
divided into eight time stages with 1 being the earliest. The 
j/-axis showed the difference of AIC value between single and 
double power laws. As compaction and inter-sheet interac¬ 
tions increases, the macroscopic behavior of the crumpled ball 
transits from favoring the latter to the former distribution. 


diction by Gleeson et a/. [50] that interactions can bring 
out a subtle change in macroscopic behavior. Note that 
the two sheets already mingle with each other consider¬ 
ably at the stage when DPL was observed, as shown by 
Figiu: c), and their emitted sounds can be easily passed 
for power law, if not for the scrutiny of the Akaike in¬ 
formation criterion. The eventual switch to SPL can be 
understood as being characteristic of a singly crumpled 
composite sheet molted by the strong inter-sheet inter- 
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actions. Since the exponents of DPL determined by the 
likelihood function match the data for the singly crum¬ 
pled, the second power law cannot be dismissed as cor¬ 
rection to SPL due to some relevant variable's]. 

In Table [n] we highlighted more power-law claims that 
suffer excessive loss of information. Up to ten terms have 
been checked not to be the winner, except in the Zipf’s 
law for word frequency [32] where quadruple power laws 
were found to retain the most information, the Akaike 
information criterion concludes that DPL should replace 
SPL in earthquake [33], brain functional network, solar 
flare intensity, and 2-dimensional sand pile model (exem¬ 
plar of self-organized criticality). 

For the data in Figj^a) that span 33 years until 
2013, the Gutenberg-Richter law predicts a probability 
of 0.000260 for earthquakes[26| of Richter scale 6 to 8 
to occur. But according to DPL parameters in Table [Til 
the forecast moves up by almost two folds to 0.000464. 
This discrepancy is statistically significant to warrant at¬ 
tentions. It should be noted that Schorlemmer et al. |33j 
concluded that the exponent of Gutenberg-Richter law 
could vary for different styles of faulting. However, our 
data were collected without screening to retain particu¬ 
lar rake angles, and so it is not clear whether our finding 
can be ascribed to their theory. 

Why should the brain activities in Figj^b) prefer dou¬ 
ble power laws? It may simply be due to the fact that left 
and right hemispheres of human brain perform a fairly 
distinct set of operations. As for the sand pile model in 
Figj^d), the implication is slightly trickier. Note from 
Table [IT] the two power-law terms are of opposite signs. 
This implies an opposing mechanism that hinders and in¬ 
terferes with the “normal” process of self-organized crit¬ 
icality. We suspect the culprit is the interaction between 
multiple sand piles. The avalanch from one pile is sure 
to stack up at the foot of its neighboring piles and “kill” 
the avalanches that can originally occur there. Per Bak 
pointed out that the falloff at large cluster size~ 200 for 
the 2-dimensional sand pile model is due to finite-size 
effect [Bj. We thus have imposed an upper cutoff of 200 in 
Figj^d) when calculating the AIG value. In other words, 
the fact that DPL still performs better than the simple 
power law implies either the finite-size effect already ex¬ 
ists in smaller clusters or there is a yet-unknown mech¬ 
anism in the sand pile model besides the self-organized 
criticality. 

Table [ll| also reported ZMD to be more favored than 
simple power law for the duration-time frequency of 
solar flare, web link, protein-domain frequency, and 
stock-market fluctuations. In fact, Eq.(3) in Ref. [33] 
already assumes the form of ZMD for the solar flare 
rate, but the authors neglect the shift based on the 
assumption of long waiting time. Adopting the same cut 
in the breakpoint as Ref. [33] , the Akaike information cri¬ 
terion still unveils ZMD as its true distribution. For the 
stock-market fluctuation, we followed Ref. [3T] at defining 
normalized stock-price return, and analyzed the daily 
close price of New York Stock Exchange from Decem- 



FIG. 7: (color online) Notable power law claims that require 
modifications. The DPL (in red solid line) outperforms SPL 
(in black dash line) for (a) earthauakes|26| . (b) brain func¬ 
tional network|27|. and (c) solar flare intensitv|ll|. and (d) 
2-dimensional sand pile model [28]. 


ber 31, 1965 to May 3, 2015. Data were downloaded from 
http : //finance.yahoo.com/q/hp?s NYA -|- Historical 

-1-Prices. We checked that the cumulative plot and mag¬ 
nitude of power-law exponent (if fit by SPL) were similar 
to those on pages 46 and 47 of the second work in 
Ref. [31]. 

Because the Akaike information criterion contains the 
likelihood function, it is also sensitive to the range of pa¬ 
rameter. And the larger the range, the more effective the 
Akaike information criterion is at discerning the perfor¬ 
mance of different fitting functions. In order to vindicate 
the power-law claims in Table [H] we chose the same range 
as their respective references. Note that the difference in 
AIG values we obtained is large enough to guarantee the 
more stringent likelihood ratio test of which the statis¬ 
tical reasoning is detailed in Appendix B. The test is to 
make sure that the evidence is strong enough to dismiss 
the existing model (e.g., the SPL in this study) by re¬ 
quiring the following condition be met when the sample 
size is large [33]: 

- 21n^ > x2 g^(fc2 - 1) (7) 

where Li is the likelihood function for SPL and Xq 95(^2 — 
1) equals 5.99 (3.84) when L 2 represents DPL (ZMD) 
which corresponds to fc2 = 3 (2). The condition, Eq.Q, 
can be equivalently verified via 

AIGi - AIC2 > xl. 95 {k 2 - 1) - 2(fc2 - 1) (8) 

which clearly holds for all cases in Table [H] For example, 
in order for DPL to replace SPL at describing the earth¬ 
quakes, the required AIG difference is 1.99 according to 
Eq.(|^. The value obtained in Table [TT| is 2.8, which is 
large enough to justify the rejection of SPL as a null hy¬ 
pothesis. In the mean time, the minimum difference is 
1.84 for ZMD to replace SPL. 
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V. CONCLUSION AND DISCUSSION Mason A. Porter for helpful comments. 


We introduced the Akaike information criterion as a 
rigorous statistical gauge to determine whether a power 
law claim is legitimate. Combined with the stringent like¬ 
lihood ratio test, our statistical procedures determined 
that several famous power laws should be rejected and 
replaced by double power laws, the same distribution as 
for the sound emission of a loose crumpled-together ball. 
Since we designed the latter experiment, the physical rea¬ 
son why the power-law ansatz is inappropriate is clear 
and can be explained. But, since we can not be an expert 
of all the phenomena in the former case that cover fields 
as diverse as seismology, neuroscience, astronomy, social 
science, and financial market, our conclusions are solely 
based on statistical evidence - the information theory to 
be specific. In addition, the crumpled-together data are 
sorted temporally to reveal a transition in the statistical 
behavior from favoring double power laws when the inter¬ 
sheet interactions were weak to a simple power law at 
high compactions. This observation confirms the gener¬ 
ally belief and the theoretical predictions of Ref[20] that 
interactions can bring about a subtle change in macro¬ 
scopic behavior. 

We have to admit that we do not fully understand the 
physical implications of our findings; e.g., what essential 
physics is missing in the previous models that predict 
the power-law behavior, why there are so few examples 
of thrice or higher power laws in our study, why the 2- 
dimensional sand pile model should betray its role as a 
paradigm for self-organized criticality and, more gener¬ 
ally, how many more power laws in complex systems are 
in fact wrongly identified? 

Derived from the Kullback-Leibler discrepancy, the 
Akaike information criterion provides a simple and effec¬ 
tive way to select the best approximation model among 
competitors. With the optimal property of being asymp¬ 
totic efficiency, it outperforms other selection criteria, 
such as Bayesian Information Criterion, for selecting pre¬ 
dictive mo dels pH]. Based on the information theory, 
the Akaike information criterion is relevant to the Lan- 
dauer’s principle |36j and recent interest in using entropy 
transferPTj to quantify directed statistical coherence be¬ 
tween spatiotemporal processes. It can be said that the 
Akaike information criterion is essentially an application 
of the Second Law of Thermodynamics |22[ IH5] . while 
Landauer’s principle is a simple logical consequence of 
the law. Although the Akaike information criterion is 
originally intended for probability density functions, we 
describe in Appendix C how it can be generalized to 
tackle dimensional data, such as pressure versus volume 
for gases. 
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Appendix A: Alternative and mathematically more 
rigorous derivations for the results in Sec. Ill 


The log-likelihood for fitting SPL 
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\uL = --\na 




implies 


d\nL a 
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51nL 1 _i 3 


(Al) 


By setting zeros for Eq.(Al), the maximum likelihood 
estimates /3 and a satisfies 


leading to a large-sample version: 
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where rm = a;-f . To obtain A/S , Eq.(l) gives 
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in which bi = ax~^ — rrii represents the fitting bias. 
When the discrepancy between two SPL is small, i.e., 
\bi\/mi < £ for a small e, the numerator and denomina¬ 
tor of Eq. (A2) can be approximated by 


1 


N 


_ i i i 

den = ^a:~^^(lna/i)^ -f a'y^ biX~^{\nXj)'^ 

i i 

= ''^{ax~^ — mi + mi)^(lnxi)^ + ^ bi{ax~^ — m, 

i i 

= '^{bi +mi)'^{\nxi)^ + '^bi{bi + mi){\nXi)'^ 
i i 

= ^ rnf{\nx^) + 3''^bimi{lnXi)^ + 2 ^ ^^(Ina;^)^ 

i i i 

~ (1 -I- 3e -I- 2e^) ^ m^(lna;i)^ 

(A3) 


-I- rni)(lna 
















since 


bimi{\nXi) 
i 


Ina;,- 


< 


i i 

e^^m^i^aiXif. 

i 


Plugging Eq.(A3) into Eq.(A2) leads to Eq.(3). 


Appendix B: Statistical reasoning behind the 
likelihood ratio test of Eq.([7|) 

AIC criteria is commonly used to rank models, which 
is an estimated KL distance (measuring the information 
loss) calculated from the empirical data. To make model 
comparison, one should look at the relative difference be¬ 
tween two AIC values associated with different models, 
and not taking into account the AIC values themselves. 
The reasoning is explained via the following toy exam¬ 
ple. Suppose we have data {Ai, A 2 ,..., A„} generated 
from some normal distribution cr^). One would like 
to verify which model is preferred, Model 1 with /i = 1 or 
Model 2 with p, = 0? Given the definition Eq.Q in the 
manuscript, the AIC value for Model i can be simplified 
as 


Ti 

AlCi = —2 log Li -\-2ki = — log(27r) -f n 

constant 

V_ ^ ^_ iy complexity 

goodness of fit 

where z = 1,2 and fci = ^2 = 1 in this toy example. 
This expression consists of three parts: (a) a constant 
term related to the sample size n and the normalizing 
factors (such as 27r) in the normal density function; (b) a 
key term related to model fitting measuring goodness-of- 
fit; (c) the degrees of freedom term indicating the model 
complexity. The constant term plays no role on model 
comparison, in particular its scale is proportional to n 
(the amount of data), which is completely irrelevant to 
model comparison but could be a dominate term in the 
AIC value when handling large data. In contrast, the 
goodness-of-fit term and model complexity term are crit¬ 
ical factors for model comparison. Among these two 
terms, the value of goodness-of-fit growing with n will 
further dominate the value of model complexity in deter¬ 
mining the model ranking for handling large data sets. 
Though the descriptions given above are under a partic¬ 
ular model setting, similar arguments are generally held 
for other model scenarios, including our case. Back to 
the results given in Tables I & II in the manuscript, the 
difference of AIC is small compared to AIC themselves 
mainly due to large sample size in our examples. But, 


for the purpose of model comparison, we shall only look 
into the difference of AIC values between models, with¬ 
out concerning the magnitude of AIC with an inflated 
constant term. 

Equation 0 is a special case of a likelihood ratio 
test based on asymptotical distribution. Likelihood ra¬ 
tio test is a standard hypothesis testing method in the 
statistical literature. Generally speaking, it applies to 
a nested scenario of statistical hypotheses: the null hy¬ 
pothesis Hq : 9 G Qq vs . the alternative hypothesis 
Hi : 9 G Q, where ©o C 0. Let Li and L 2 be the 
maximum likelihood values under the reduced model Hq 
and the general model Hi, respectively. Based on the 
asymptotic theory, — 21 og(Li/L 2 ) bas a distribution 
with the degrees of freedom dim{Q) — dim{Qo) under Hq. 
In our case, Hq refers to SPL model; Hi refers to either 
DPL or ZMD model. Due to dim{Qo) = I for SPL and 
^2 = dim{Q) = 2 for DPL (or 3 for ZMD), —2 log(Li/L 2 ) 
would behave like a x^(fc 2 — 1) distribution if SPL is the 
underlying true model. Consequently, observing a large 
value of — 21 og(Li/L 2 ) relative to the x^(fc 2 — 1) distri¬ 
bution indicates that SPL is not plausible for the data. 

Let AICi and Li represent the AIC and maximum like¬ 
lihood value for fitting SPL model, AIC 2 and L 2 repre¬ 
sent their counterparts for fitting DPL (^2 = 2) or ZMD 
{k 2 = 3) model. By definition, we have 

AICi = -2 log Li + 2; AIC 2 = -2 log L 2 + 2k2. 
Their difference satisfies 

AICi - AIC 2 = (-2 log Li+2)- (-2 log L 2 + 2k2) 

= —2 \og{Li/L 2 ) -I- 2(1 — /c2), 

which implies an equivalent expression of Eq. Q from the 
inequality in Eq.Q. 


Appendix C: Use of the Akaike information criterion 
for non-probability functions 

It is always preferable to work with the raw data. How¬ 
ever, if they are not available or when the data refer to 
(pressure, volume) of a gas or (pressure, mass density) of 
a crumpled ball, Eq.Q cannot be used. Reasons are sim¬ 
ple: without information of n for one thing, it is mean¬ 
ingless to compare In L with 2k in Eq. 0 because Eq. 0 
now carries units. Instead, we appeal to Eq.([^ for the 
Akaike information criterion. 

First, suppose we know a priori that the data {xi,yi) 
are close to the set (xi, me{xi)) with small errors, where 
mgixi) contains unknown parameters, 6 = ( 0 i, 02 ) ’ ’ ’) ■ 
We then assume the errors, Si = yt — mg(xi), obey the 
normal distribution and use the Gaussian likelihood in 
Eq.([^. The normal assumption often holds for experi¬ 
mental data, in particular when data come from group¬ 
ing or averaging due to central limit theorem. Under this 
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fairly general assumption, Eq.([^ becomes 


N 


^=n 


2=1 




zexp - 


{Vi - rne{xi)y 


(Cl) 


By setting dL/da = 0, it is straightforward to show that 
the variant maximizing the likelihood is nothing but the 
mean squared errors: 


N 

a'^ = {1/N)'^{yi-m0{x^)y, (C2) 


with the width of the Gaussian function input from ex¬ 
perimental observations. Taking logarithm now leads to 
the form InL = C—(1/2)^ (yi—mg(xi)) jaf where con¬ 
stant C is again not important and maximizing the sec¬ 
ond term is equivalent to minimizing the weighted least 
squares: 


N 

= ^{yi-mg{xi)f/ay (C6) 

i=l 

subject to 6. AIC value becomes 


subject to 6. By using this information and taking loga¬ 
rithm, Eq.(Cl) reduces to 


AlG = 2k + x^- (C7) 


InL = -(iV/2) In(cr^) - N/2. (C3) 

The constant term N/2 is irrelevant when comparing dif¬ 
ferent AIC values. Then AIC has a succinct form 


AIC = 2k + N\n{ay 


(C4) 


where a^ is the minimizer of Eq. (C2) which turns out to 
be equivalent to the familiar method of least square in a 
regression. 

If the data are accompanied with known error bars tJi, 
the likelihood function in Eq.(|Cl|) should be modified as 


N 


^=n 


1 


Li 


.exp - 


{Vi - m0{xi)y 
2 ^? 


(C5) 


As a close connection, the minimizer of Eq. (C6) can also 
be used to perform a goodness-of-fit test. 


For the scenarios this appendix is aimed for, the raw 
data either have been gathered into histogram or carry 
units. As a result, the sample size is normally limited, 
i.e., the bin size N might not be much larger than k. In 
these cases the Akaike information criterion tends to pick 
an over-fit model and needs a bias-correction [38l [39] by 
AICc: 


AICc = AIC 


2k{k + I) 
N-k-1 


(C8) 


with the correction term that goes to zero when N ^ k. 
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